58 research outputs found

    DiRE: identifying distant regulatory elements of co-expressed genes

    Get PDF
    Regulation of gene expression in eukaryotic genomes is established through a complex cooperative activity of proximal promoters and distant regulatory elements (REs) such as enhancers, repressors and silencers. We have developed a web server named DiRE, based on the Enhancer Identification (EI) method, for predicting distant regulatory elements in higher eukaryotic genomes, namely for determining their chromosomal location and functional characteristics. The server uses gene co-expression data, comparative genomics and profiles of transcription factor binding sites (TFBSs) to determine TFBS-association signatures that can be used for discriminating specific regulatory functions. DiRE's unique feature is its ability to detect REs outside of proximal promoter regions, as it takes advantage of the full gene locus to conduct the search. DiRE can predict common REs for any set of input genes for which the user has prior knowledge of co-expression, co-function or other biologically meaningful grouping. The server predicts function-specific REs consisting of clusters of specifically-associated TFBSs and it also scores the association of individual transcription factors (TFs) with the biological function shared by the group of input genes. Its integration with the Array2BIO server allows users to start their analysis with raw microarray expression data. The DiRE web server is freely available at http://dire.dcode.org

    Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Researchers seeking to unlock the genetic basis of human physiology and diseases have been studying gene transcription regulation. The temporal and spatial patterns of gene expression are controlled by mainly non-coding elements known as cis-regulatory modules (CRMs) and epigenetic factors. CRMs modulating related genes share the regulatory signature which consists of transcription factor (TF) binding sites (TFBSs). Identifying such CRMs is a challenging problem due to the prohibitive number of sequence sets that need to be analyzed.</p> <p>Results</p> <p>We formulated the challenge as a supervised classification problem even though experimentally validated CRMs were not required. Our efforts resulted in a software system named CrmMiner. The system mines for CRMs in the vicinity of related genes. CrmMiner requires two sets of sequences: a mixed set and a control set. Sequences in the vicinity of the related genes comprise the mixed set, whereas the control set includes random genomic sequences. CrmMiner assumes that a large percentage of the mixed set is made of background sequences that do not include CRMs. The system identifies pairs of closely located motifs representing vertebrate TFBSs that are enriched in the training mixed set consisting of 50% of the gene loci. In addition, CrmMiner selects a group of the enriched pairs to represent the tissue-specific regulatory signature. The mixed and the control sets are searched for candidate sequences that include any of the selected pairs. Next, an optimal Bayesian classifier is used to distinguish candidates found in the mixed set from their control counterparts. Our study proposes 62 tissue-specific regulatory signatures and putative CRMs for different human tissues and cell types. These signatures consist of assortments of ubiquitously expressed TFs and tissue-specific TFs. Under controlled settings, CrmMiner identified known CRMs in noisy sets up to 1:25 signal-to-noise ratio. CrmMiner was 21-75% more precise than a related CRM predictor. The sensitivity of the system to locate known human heart enhancers reached up to 83%. CrmMiner precision reached 82% while mining for CRMs specific to the human CD4<sup>+ </sup>T cells. On several data sets, the system achieved 99% specificity.</p> <p>Conclusion</p> <p>These results suggest that CrmMiner predictions are accurate and likely to be tissue-specific CRMs. We expect that the predicted tissue-specific CRMs and the regulatory signatures broaden our knowledge of gene transcription regulation.</p

    Twist1 Directly Regulates Genes That Promote Cell Proliferation and Migration in Developing Heart Valves

    Get PDF
    Twist1, a basic helix-loop-helix transcription factor, is expressed in mesenchymal precursor populations during embryogenesis and in metastatic cancer cells. In the developing heart, Twist1 is highly expressed in endocardial cushion (ECC) valve mesenchymal cells and is down regulated during valve differentiation and remodeling. Previous studies demonstrated that Twist1 promotes cell proliferation, migration, and expression of primitive extracellular matrix (ECM) molecules in ECC mesenchymal cells. Furthermore, Twist1 expression is induced in human pediatric and adult diseased heart valves. However, the Twist1 downstream target genes that mediate increased cell proliferation and migration during early heart valve development remain largely unknown. Candidate gene and global gene profiling approaches were used to identify transcriptional targets of Twist1 during heart valve development. Candidate target genes were analyzed for evolutionarily conserved regions (ECRs) containing E-box consensus sequences that are potential Twist1 binding sites. ECRs containing conserved E-box sequences were identified for Twist1 responsive genes Tbx20, Cdh11, Sema3C, Rab39b, and Gadd45a. Twist1 binding to these sequences in vivo was determined by chromatin immunoprecipitation (ChIP) assays, and binding was detected in ECCs but not late stage remodeling valves. In addition identified Twist1 target genes are highly expressed in ECCs and have reduced expression during heart valve remodeling in vivo, which is consistent with the expression pattern of Twist1. Together these analyses identify multiple new genes involved in cell proliferation and migration that are differentially expressed in the developing heart valves, are responsive to Twist1 transcriptional function, and contain Twist1-responsive regulatory sequences

    Decoding a cancer-relevant splicing decision in the RON proto-oncogene using high-throughput mutagenesis

    Get PDF
    Mutations causing aberrant splicing are frequently implicated in human diseases including cancer. Here, we establish a high-throughput screen of randomly mutated minigenes to decode the cis-regulatory landscape that determines alternative splicing of exon 11 in the proto-oncogene MST1R (RON). Mathematical modelling of splicing kinetics enables us to identify more than 1000 mutations affecting RON exon 11 skipping, which corresponds to the pathological isoform RON Delta 165. Importantly, the effects correlate with RON alternative splicing in cancer patients bearing the same mutations. Moreover, we highlight heterogeneous nuclear ribonucleoprotein H (HNRNPH) as a key regulator of RON splicing in healthy tissues and cancer. Using iCLIP and synergy analysis, we pinpoint the functionally most relevant HNRNPH binding sites and demonstrate how cooperative HNRNPH binding facilitates a splicing switch of RON exon 11. Our results thereby offer insights into splicing regulation and the impact of mutations on alternative splicing in cancer.Institute of Molecular Biology Core Facilities; DFG [ZA 881/2-1, KO 4566/4-1, LE 3473/2-1]; LOEWE program Ubiquitin Networks (Ub-Net) of the State of Hesse (Germany); Deutsche Forschungsgemeinschaft [SFB902 B13]; EMBO [3057]; Fundacao para a Ciencia e a Tecnologia, Portugal (FCT Investigator Starting Grant) [IF/00595/2014]; German Federal Ministry of Research (BMBF; e:bio junior group program) [FKZ: 0316196]; Boehringer Ingelheim Foundation; [INST 47/870-1 FUGG

    A computational evaluation of over-representation of regulatory motifs in the promoter regions of differentially expressed genes

    Get PDF
    BACKGROUND: Observed co-expression of a group of genes is frequently attributed to co-regulation by shared transcription factors. This assumption has led to the hypothesis that promoters of co-expressed genes should share common regulatory motifs, which forms the basis for numerous computational tools that search for these motifs. While frequently explored for yeast, the validity of the underlying hypothesis has not been assessed systematically in mammals. This demonstrates the need for a systematic and quantitative evaluation to what degree co-expressed genes share over-represented motifs for mammals. RESULTS: We identified 33 experiments for human and mouse in the ArrayExpress Database where transcription factors were manipulated and which exhibited a significant number of differentially expressed genes. We checked for over-representation of transcription factor binding sites in up- or down-regulated genes using the over-representation analysis tool oPOSSUM. In 25 out of 33 experiments, this procedure identified the binding matrices of the affected transcription factors. We also carried out de novo prediction of regulatory motifs shared by differentially expressed genes. Again, the detected motifs shared significant similarity with the matrices of the affected transcription factors. CONCLUSIONS: Our results support the claim that functional regulatory motifs are over-represented in sets of differentially expressed genes and that they can be detected with computational methods

    Clustered ChIP-Seq-defined transcription factor binding sites and histone modifications map distinct classes of regulatory elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases.</p> <p>Results</p> <p>Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding.</p> <p>Conclusion</p> <p>By integrating genomewide data of transcription factor binding and chromatin structure and using our data-driven approach, we pinpointed the chromatin marks that best explain transcription factor association with different regulatory elements. Our results also indicate that a modest selection of transcription factors may be sufficient to map most regulatory elements in the human genome.</p
    corecore